tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification
نویسندگان
چکیده
The use of background knowledge is largely unexploited in text classification tasks. This paper explores word taxonomies as means for constructing new semantic features, which may improve the performance and robustness learned classifiers. We propose tax2vec, a parallel algorithm taxonomy-based demonstrate its on six short problems: prediction gender, personality type, age, news topics, drug side effects effectiveness. constructed combination with fast linear classifiers, tested against strong baselines such hierarchical attention neural networks, achieves comparable results documents. algorithm’s also few-shot learning setting, indicating that inclusion features can data-scarce situations. tax2vec capability to extract corpus-specific keywords demonstrated. Finally, we investigate space potential where observe similarity well known Zipf’s law.
منابع مشابه
On Constructing Shallow Taxonomies from Social Annotations
Tagging in social media system has demonstrated to be a convenient way for users to annotate objects of interest.One reason behind its success obviously because tags can be chosen by users arbitrarily without any topic and specificity constraints. Although tags are free-from keywords, there are some evidences 1 suggesting that, for a particular object type, users tend to use “similar” tag sets....
متن کاملExtracting Interpretable Features for Early Classification on Time Series
Early classification on time series data has been found highly useful in a few important applications, such as medical and health informatics, industry production management, safety and security management. While some classifiers have been proposed to achieve good earliness in classification, the interpretability of early classification remains largely an open problem. Without interpretable fea...
متن کاملTowards Acquiring Case Indexing Taxonomies From Text
Taxonomic case-based reasoning is a conversational casebased reasoning methodology that employs feature subsumption taxonomies for incremental case retrieval. Although this approach has several benefits over standard retrieval approaches, methods for automatically acquiring these taxonomies from text documents do not exist, which limits its widespread implementation. To accelerate and simplify ...
متن کاملSelecting Features for Ordinal Text Classification
We present four new feature selection methods for ordinal regression and test them against four different baselines on two large datasets of product reviews.
متن کاملBoosting for Text Classification with Semantic Features
Current text classification systems typically use term stems for representing document content. Ontologies allow the usage of features on a higher semantic level than single words for text classification purposes. In this paper we propose such an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting, a successful machine learning tec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computer Speech & Language
سال: 2021
ISSN: ['1095-8363', '0885-2308']
DOI: https://doi.org/10.1016/j.csl.2020.101104